{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "gpuType": "T4"
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "Convert PDF documents and images to high-quality markdown format using vision-language models.\n",
        "\n",
        "## Features\n",
        "\n",
        "- **LaTeX Equation Recognition**: Convert both inline and block LaTeX equations in images to markdown.\n",
        "- **Intelligent Image Description**: Generate a detailed description for all images in the document within `<img></img>` tags.\n",
        "- **Signature Detection**: Detect and mark signatures and watermarks in the document. Signatures text are extracted within `<signature></signature>` tags.\n",
        "- **Watermark Detection**: Detect and mark watermarks in the document. Watermarks text are extracted within `<watermark></watermark>` tags.\n",
        "- **Page Number Detection**: Detect and mark page numbers in the document. Page numbers are extracted within `<page_number></page_number>` tags.\n",
        "- **Checkboxes and Radio Buttons**: Converts form checkboxes and radio buttons into standardized Unicode symbols (☐, ☑, ☒).\n",
        "- **Table Detection**: Convert complex tables into html tables."
      ],
      "metadata": {
        "id": "SEyzHm3aswdG"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "XKDBP7lKLyKz"
      },
      "outputs": [],
      "source": [
        "# install the docext package\n",
        "## UV tries to access these files\n",
        "!mkdir /backend-container\n",
        "!mkdir /backend-container/containers\n",
        "!touch /backend-container/containers/build.constraints\n",
        "!touch /backend-container/containers/requirements.constraints\n",
        "!uv pip install docext -q\n",
        "\n",
        "!python -m docext.app.app --model_name hosted_vllm/nanonets/Nanonets-OCR-s --gpu_memory_utilization 0.95 --max_model_len 10000 --max_gen_tokens 8000 --concurrency_limit 1 --max_num_imgs 1 --dtype float16 --server_port 9998"
      ]
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "AQw5-0E6PWn1"
      },
      "execution_count": null,
      "outputs": []
    }
  ]
}
